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A common practice in obtaining a semiparametric efficient esti- 
mate is through iteratively maximizing the (penalized) log-likelihood 
w.r.t. its Euclidean parameter and functional nuisance parameter via 
Newton-Raphson algorithm. The purpose of this paper is to provide 
a formula in calculating the minimal number of iterations k* needed 
to produce an efficient estimate 9 n from a theoretical point of view. 
We discover that (a) k* depends on the convergence rates of the ini- 
tial estimate and nuisance estimate; (b) more than k* iterations, i.e., 
k, will only improve the higher order asymptotic efficiency of ; (c) 
k* iterations are also sufficient for recovering the estimation sparsity 
in high dimensional data. These general conclusions hold, in partic- 
ular, when the nuisance parameter is not estimable at root-n rate, 
and apply to semiparametric models estimated under various regu- 
larizations, e.g., kernel or penalized estimation. This paper provides a 
first general theoretical justification for the "one-/two-step iteration" 
phenomena observed in the literature, and may be useful in reducing 
the bootstrap computational cost for the semiparametric models. 



1. Introduction. Semiparametric models indexed by a Euclidean pa- 
rameter of interest 6 £ C M. d and an infinite-dimensional nuisanceparam- 
eter r? G H are proven to be useful in a variety of contexts, e.g., [H, @, QjJ 



231 . 1271 . |29 . |34| . |39(. The semiparametric MLE for 6 can be viewed as a so- 
lution of the implicitly defined efficient score function whose nonparametric 
estimation is only possible in some special cases, e.g., [23j]. Therefore, it is 
generally hard to solve the MLE from the efficient score function analytically 
or numerically A common practice is to maximize the log-profile likelihood 

(1) log pl n {6) = sup log lik n (9,rj), 

where lik n {0,rj) is the likelihood given n data, via some optimization algo- 
rithm. For example, the Newton-Raphson algorithm is applied to the partial 
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likelihood of the Cox model in the software R (with the command coxph). 

A general algorithm of obtaining a semiparametric efficient estimate of 9 
is to iteratively maximize the log-likelihood w.r.t. and 7] as follows: 

General Semiparametric Iterative Estimation Algorithm 

I. Identify an initial estimate 9 n ; 

II. Construct the corresponding nuisance estimate rj{9 n ) either by pure 
nonparametric approach, e.g., isotonic estimation, or under some reg- 
ularization, e.g., kernel or sieve estimation; 

III. Apply the Newton-Raphson (NR) or other optimization algorithm to 

(2) S n (0) = loglik n (0,ri(9)), 

at = W 1 to obtain flj^; 

IV. Repeat steps II-III k* iterations until 

|5 n (^*))-5„(^- 1 ))|< e 
for some pre-determined sufficiently small e. 
Note that S n (9) defined in ([2]) is also called the generalized profile likeli- 



hood in [38|]. If ff(9) is the nonparametric MLE (NPMLE) for any fixed 
9, then S n (9) is just the profile likelihood defined in (HJ. The above likeli- 
hood estimation procedure or its M-estimation analog has been extensively 
implemented in the literature. Here is an incom plet e list: (i) Odds-Rate 
Regression Model under Survival Data, e.g^, J23l . |29j; (ii) Semiparametric 
Regression under Shape Constraints, e.g., [J, [ll|; (iii) Logistic Regression 
with Missing Covariates, e.g., (iv) Generalized Partly Linear (Single In- 
dex) Model, e.g., [1, [HJ; (v) Conditionally Parametric Model, e.g. [H, 39]; 



(vi) Semiparametric Transformation Model, e.g., [271 ] . In addition, the above 
iterative procedure can also be adapted to the penalized estimation and se- 
lection of the semi par ametric models by using a different criterion function 
than ([2]), see [7j, [la, l28f] . We will discuss that scenario in Section I3~2l However, 
in all the above papers, k* or e is arbitrarily chosen in practice. 

The main purpose of our paper is to answer "How Many Iterations Do 
We Really Need in Semiparametric Estimation?" from a theoretical point 
of view. We provide a general formula in calculating the minimal number 
of iterations k needed to produce a semiparametric efficient 9 n . Specifi- 
cally, we discover that (a) k* depends on the convergence rates of 9 n and 
rj(9); (b) more than k* iterations, i.e., k, will not change the limiting dis- 
tribution of 9 n , but will improve its higher order asymptotic efficiency; (c) 
k* iterations are also sufficient for recovering the estimation sparsity un- 
der high dimensional data. These general conclusions hold, in particular, 
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when the nuisance parameter is not estimable at root-n rate, and apply to 
semiparametric models estimated under various regularizations, e.g., kernel 
or penalized estimation. Note that the convergence rate of the regularized 
estimate rj(9) is determined by the related smoothing parameters, e.g., the 
bandwidth order in kernel estimation. Moreover, our construction of the ef- 
ficient estimate does not require knowing the form of the implicitly defined 
efficient score function or apply the sample splitting technique and the drop- 



one-out trick required in the classical literature, i.e., [a, [25|, l35|, l36(] . A general 



strategy of identifying n with proper convergence rate is also considered. 
The technical challenge of this paper is that S n (9) in practice may not have 
an explicit form or is not continuous/smooth. 

As far as we are aware, our paper provides a first general theoretical 
justification for the "one-/two-step iteration" phenomenon, i.e., k* = 1,2, 
observed in the semiparametric literature. However, we find that more it- 
erations are absolutely necessary if rj is estimated at a very slow rate. For 
example, we need 8 iterations to achieve the efficiency in conditionally ex- 
ponential models, see Table 3. Moreover, our results are readily extended to 
the bootstrap estimation by combining with the most recent bootstrap con- 
sistency results obtained for semiparametric models in Therefore, we 
expect to significantly reduce bootstrap computational cost, which is very 
high in semiparametric models, after knowing k* for each bootstrap sample. 
See 0] for similar ideas but applied to the parametric models. Due to the 
space limitation, we only consider the NR algorithm based on original sam- 
ple in this paper, but notice that the extensions to the slight modifications 
of NR are possible by considering the discussions in Page 534 of [3^] . 

Section [2] provides some necessary background material on the semipara- 
metric estimation. In Section [3j we consider the semiparametric maximum 
likelihood estimation in which S n (9) is the possibly nonsmooth profile like- 
lihood ([I]) . In Section 01 we consider the semiparametric estimation under 
two types of regularization, i.e., kernel estimation and penalized estimation, 
in which S n (6) is smooth. In that section, we also consider the sparse and 
efficient estimation of the partial linear models as an important application 
of penalized estimation. In Section [5j we propose two grid search algorithms 
for identifying the initial estimate whose convergence rate will be rigorously 
proven. Several semiparametric models ranging from survival models, mix- 
ture models to conditionally exponential models are treated to illustrate the 
applicability of our theories. All the proofs are postponed to the Appendix. 

2. Preliminary. We assume that the data X%, X n are i.i.d. through- 
out the paper. In what follows, we first briefly review the concepts of the 
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efficient score function and the least favorable curve (LFC), and then re- 
late the estimation of LFC to that of 9 as discussed in [38( . Unless otherwise 
specified, the notation E is reserved for the expectation taken under (9o,rjo). 
The score functions for 9 and r\ are defined as, respectively, 

d 

£o(Xi) = — log lik{Xi;9 ,r] ), 
d 

(3) A gom h(Xi) = — \ t=0 log hk(Xi;9 ,r)(t)), 

where h is a "direction" along which r)(t) G % approaches r]o as t — > 0. Aq 0jVo : 
H i — y L>2(Po Qt r) ) is the score operator for 77, where H is some closed and linear 
diection set. The efficient score function £q is defined as the residual of the 
projection of £q onto the tangent space T, which is defined as the closed 
linear span of the tangent set {Ag o r)0 H = (AQ 0tT)0 hi, . . . , Ag Q ^ Q h,i)' ■ hj G H}. 
Therefore, we can write the efficient score function at (9o,rjo) as 

(4) I = i - n i , 

where Ilo^o = argminfc g 7-£'||£o — k\\ 2 . The variance of £0 is defined as the 
efficient information matrix Jo- The inverse of Iq is shown to be Cramer- Rao 
bound for estimating 9 in the presence of an infinite dimensional r], see 0]. 

A main idea of estimating 9 is to reduce a high dimensional semiparamet- 
ric model to a low dimensional random submodel of the same dimension as 
9 called the least favorable submodel (LFS). The LFS can be constructed 
as 1 1 ^ log li k (t, r)*(t)) and satisfies 

(5) n^(9 ) = r]Q. 

and 

d 

(6) -Qj\t=e loghk{t,ri*(t)) = Zq 

Note that the LFS may not exist unless IIo^o can be expressed as a nuisance 
score (the tangent set is closed). In all our examples, the LFS exists or can 
be approximated sufficiently closely. The r]*(t) in the LFS is called as the 
least favorable curve. Under regularity conditions, it is shown that 

(7) f]*{t) = ar g sup -Elog lik(t, T]) for any fixed t G 0. 

By ([7|) and standard arguments, we can establish that the maximizer of 

n 

S n (9) = J2^Slik(d,r}*m(Xi) 
i=i 
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is semiparametric efficient. In addition, based on ([6]), we can derive that 

(8) /o - E { m lt=e ° ) ~ ~ E v a? lt=e ° ) ■ 

Recall that S n (9) = YJl=i lo S Uk{9 ,rj{9)){Xi) . Define 

(9) 9 n = argsupS„(0). 

In view of the above discussions, we can show that 9 n is semiparametric 
efficient if rf(9) is a consistent estimate of r]J9). The technical derivations in 
the above can be referred to Section 4 of [281]. However, the form of 9 n de- 
pends on how we estimate the abstract r)(9) defined in (J7j). For example, 9 n 
is just the semiparametric MLE if r}(9) is the well defined NPMLE. When 
the infinite dimensional H is too large, we may consider estimating r]*(9) 
under some form of regularization, e.g., penalization. It is well known that 
the convergence rate of rj{9) is determined by the size of T~L in terms of its en- 
tropy number and the smoothing parameters associated with regularization 
methods (if used), e.g., smoothing parameter in penalized estimation. 

In the following, we will consider two types of 9 n defined in ([9]) according 
to how we estimate 77* (9): (i) pure nonparametric estimation in Section [31 
(ii) nonparametric estimation under regularization in Section Define R n X 
r n if r n /M < R n < r n M for some M > 1. We use A/"(#o) to denote a 
neighborhood of 9q. Let Vi denote the i-th unit vector in R d . Define the i-th 
((i, j)-th) element of a vector V (Matrix M) as Vi (Mj,-). For a tensor T^ 3 \9), 
wc define V T <g> T^{9) ® V ELS cl (i-dimensional vector with i-th element 
V T {dyd9 2 )(f{9))iV, where f{9) is the first derivative of T(9). Denote int[x] 
and int[x] as the smallest nonnegative integer > x and > x, respectively. The 
symbols P n and G n = ■ v /n(P n — P) are used for the empirical distribution 
and the empirical process of the observations, respectively. 

3. Semiparametric Maximum Likelihood Estimation. In this sec- 
tion, we consider the maximum likelihood estimation of 9 which corresponds 
to the case that (i) rj{9) is the NPMLE for r?*(#) given any fixed 9 and (ii) 
S n {9) = \ogpl n {9). The pure nonparametric estimation of ?]*(#) is often feasi- 
ble when r\ is under shape restrictions, e.g. the monotone cumulative hazard 
function. In general, the profile likelihood does not have a closed form since 
it is defined as a supremum over an infinite dimensional parameter space, 
see ([1]). In practice, it can only be calculated numerically, e.g., via the it- 
erative convex minorant algorithm [22]]. We first discuss the construction 
of 9n , and then show that the minimal number of iterations k* is jointly 
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determined by the convergence rates of 9 n and rj{9). In the end, two classes 
of semiparametric models are presented to illustrate our theories. 

Throughout this section, we assume the following convergence rate Con- 
dition (jlOp and the LFS Conditions M1-M4 specified in Appendix. For any 



random sequence 9 
(10) 



9q, we assume that 



Vol 



O P (\\9 n -9 \\Vn- r ) 1 



where || • || is some norm in T~L and 1/4 < r < 1 /2. Of course we take the largest 
such r in the following and call it the convergence rate for estimating rj. The 
above range of r holds in regular semiparametric models, which we can define 
without loss of generality to be models where the entropy integral converges. 
Theorems 3.1-3.2 in [30j can be applied to calculate the convergence rate 
(jlOp . Under the above regularity conditions, Cheng and Kosorok (2008b) 
showed the following second order asymptotic linear expansion result. 



Theorem 1. Suppose that Conditions M1-M4 and [TP}) hold. Also 
suppose that the MLE 9 n is consistent and Iq is nonsingular. We have 



(11) 



1 n ~ ~ 
Vn~(9 n - 9 ) = -= V/ - 1 4(X i ) + P (n 



-2r+l/2\ 



We need to estimate P n £o an d Io to construct 9 n generated from the NR 
algorithm. In view of © and (JSJ), we can estimate them based on the deriva- 
tives of the log-profile likelihood (the sample analog of S n (9)) as follows 



(12) 



(13) 



@n {9, S 7 
In{9, t n ) 



log pl n (9 + s n Vi) - \ogpl n {9) 



ns r 



I.J 



\ogpl n (9 + t n (vj + vj)) + \ogpl n {9) 
ntl 

\ogpl n {9 + t n Vj) + logpl n (6> + t n Vj) 
ntl 



In the above we use the numerical derivatives since the smoothness and dif- 
ferentiability of logpl n (9) are usually unknown. In Lemma [A. II of Appendix, 
we show that (|12p and (|13p (also called as the observed information in [3(J) 
are indeed the consistent estimators. Thus, we can write 9 n in step (III) as 

(i4) gw = + [T n (d^Mt^y'l (9^\s^ 

^ = o(l). A close inspection of (|14|) reveals that 



where step sizes Sn ^ Vt 



we have constructed 9n even without knowing the forms of £q and Iq. 
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The convergence of 9 n to 9 n , which is exactly the maximizer of log pl n (9), 
as k — > oo is guaranteed by the asymptotic parabolic form of \ogpl n {9) 
proven in (3l| . However, to figure out the minimal k* such that ||#i fc ^ — # n || = 
op(n -1 / 2 ), we need to make use of the second order asymptotic quadratic 



expansion of \ogpl n (9) derived in 14j under the above regularity conditions. 
As seen from (fT4"|) . the orders of step sizes (s^ ,tn ^) are critical in 
determining the convergence rate of 9 n to 9 n , and thus need to be properly 
chosen at each iteration. In the below Lemma, we present the optimal step 
sizes, under which the fastest convergence rate is achieved, at each iteration. 
Denote the convergence rate of \\9n k ^ — 9 n \\ as Op(n~ T ' k - 1 ). 

Lemma 1. Suppose Conditions in Theorem [7] hold. The convergence 
rate of \\9 n — 9 n \\ is improved through the following three stages: 

(i) \\9rf 1 — 9 n \\ = Op(\\9n k ^ — 0n|| 3//2 ) when r^-i < r and we choose 
{s ( t l \t ( t l) )^{n-^l\n-^-^); 

(ii) WW - 9 n \\ = P (\\9n k ~ l) - 9 n \\ l / 2 n- r ) when r < r fc _i < 1/2 and we 
choose (s^/*- 1 ^ x (n- r - ^-i/2, „-r fc -i/2) . 

(Hi) \\§n^—9 n \\ = Op{n~ r ~ l / i ) whenru-i > 1/2 and we choose (s^ , tn ^) 
(n- r - 1 /*, n - r *-i/ 2 ). 

Now we present our first main theorem, i.e., Theorem [2j Let 0^ be ri^- 
consistent. We first show that \\9n — 9 n \\ = Op(n~ s ^' r,k ^) based on which 
we figure out the value of k* in f)16|) . According to the above Lemma [TJ it 
is easily seen that 5(1/2, r, fc) = r + 1/4 for any 1/4 < r < 1/2 and k > 1 
(thus k* = 1); and 5(1/3,1/2,1) = 1/2 and 5(1/3, 1/2, k) = 3/4 for any 
k > 2 (thus k* = 2). Following similar logic, we can give the general form of 
S(tp,r,k) as follows. Define, if Si(ip,r) > 1/2, 



S(i/j,r,k) 



Sx(i>,k) k<K x ty,r) 
r + 1/4 k>K 1 (?p,r) + l 



where Si(V>,£;) = ^(3/2) fc , K x {^, r) = int [log(r /</>)/ log(3/2)] and Si(i/>, r) = 
Sii^Kii^r)), and if r < Si(ip,r) < 1/2, 

( St(^k) k<K x {^,r 
S^,r,k) = l S 2 (S 1 ^,r),r,k-K 1 (^r)) Ki(ip,r) < k < Kx(ip,r) + K 2 (ip,r 
( r + 1/4 k > Ki(ip, r) + K 2 {i), r) + 

where S 2 (ip,r,k) = 2r + 2~ k (tp - 2r), K 2 (ip,r) = int[log{(2r - ip)/(2r - 
1/2)}/ log 2] and K 2 (^,r) = K 2 (Sx^,r),r). 
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Theorem 2. Suppose that Conditions in TheoremUl hold and proper 
step sizes are chosen according to Lemma\l± Let 9 n be the k-step estimator 
defined in |1^| ) and be -consistent for < ip < 1/2. Recall that 

\\&n — &n\\ = Op{n~ Tk ). We show that r^ increases from ip to (r + 1/4) as 
k — > oo. Specifically, we have 

(15) = Op{n- s ^^). 
This implies that 

(16) \\^-9 n \\=o P (n-y% 

where k* = K^ip, r) + int[log((2r - SiC0, r))/(2r - 1/2))/ log 2]. 

Interestingly, we notice that the optimal bound of — 6 n \\, i.e. Op(n~ r_1 / 4 ), 
is intrinsically determined by how accurately we estimate the nuisance pa- 
rameter, i.e., the value of r. This bound can not be further improved unless 
we are willing to make stronger assumptions than M1-M4, which seem un- 
realistic. From the form of S(ip,r,k), we find that more accurate initial 

"ik) 

estimate leads to higher order asymptotic efficiency of 9 n . How to obtain 
9 n with proper convergence rate will be discussed in Section [5j 

We apply Theorem [2] to the following two examples whose detailed tech- 



nical illustrations and model assumptions can be found in 3l|, |34|. The 
required Conditions in Theorem [2] are verified in Ijl, 14} for Examples 1-2. 
We can also apply our theory to the semiparametric regression model under 



shape constraints, e.g., [11]. 



Example 1: Cox Model under Current Status Data 

In the Cox proportional hazards model, the hazard function of the survival 
time T of a subject with covariate Z is expressed as: 

\{t\z) = hrn ^Pr(t <T <t + A\T > t, Z = z) = X(t) exp(6'z), 

where A is an unspecified baseline hazard function. We consider the current 
status data where each subject is observed at a single examination time Y to 
determine if an event has occurred, but the event time T cannot be known 
exactly. Specifically, the observed data are n realizations of X = (Y, 8, Z) G 
R + x {0, 1} x R, where 5 = I{T < Y}. The cumulative hazard function 
v(y) = lo ^{t)dt is considered as the nuisance parameter. The parameter 
space T~L for r\ is restricted to a set of nondecreasing and cadlag functions on 
some compact interval. In this model, it is well known that both rj(6) and 
log pl n (6) have no explicit forms, and can only be calculated numerically via 
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the iterative convex minorant algorithm, see [23f] . As for the convergence rate 
of rj, Murphy and van der Vaart (1999) showed ||r?(6> n ) — r/o 1 1 2 = Op(\\9 n — 
8q\\ V ji -1 / 3 ), where || • H2 is the L2 norm. According to Theorem (2J we 
establish the following table to depict the convergence of 6 y n' to ti n given 
different initial estimates until it reaches the lower bound Op(n~ 7 / 12 ). 



Table 1. Cox Model under Current Status Data (r = 1/3) 





^ = 1/3 


rp = 1/4 


Cox n = 7/12 


n = 1/2, r 2 = 7/12 


ri = 3/8, r 2 = 25/48, r 3 = 7/12 


Models k* = 1 


k* = 2 


k* = 2 



Remark: Define \\8Z } - 6 n \\ = P {n- r «). 



Example 2: Semiparametric Mixture Model in Case- Control Studies 
Roeder, Carroll and Lindsay (1996) consider the logistic regression model 
with a missing covariate for case-control studies. In this model, they observe 
two independent random samples: one complete component Yq = {Dq, Wc) 
and Zc of the size nc, and one reduced component Yr = (Dr, Wr) of 



the size ur. Following the assumptions given in 34|, the likelihood for x = 
(VC, VR,zc) is defined as 

lik(9',rj)(x) = pe>(yc\zc)rj{zc} J p 9 >(yR\z)dr](z), 

where dn denotes the density of r\ w.r.t. some dominating measure, and 

/ exp( 7 + fl e *) \ d f 1 ( ^ 

P9'{y\z)= — ; — r^r 7- 7 — r-7f^r ^(w - a - a^), 

\1 + exp(7 + 6e z ) J \ 1 + exp(7 + tie z ) ) 

where (frcr(-) denotes the density for N(0, a). The unknown parameters are 
ti' = (8, Qo, «i, 7, a) ranging over the compact 0' C I 4 x (0, 00) and the dis- 
tribution r] of the regression variable restricted to the set of nondegenerate 
probability distributions with a known compact support. In this semipara- 
metric mixture model, we will concentrate on the regression coefficient 6, 
considering 62 = («0) a i> 7j°") an d rj as nuisance parameters. The NPMLE 
rj(ti)(z) is a weighted average of two empirical distributions and the log- 
profile likelihood implicitly defined as 

S n (9) = log pl n {9) = sup log lik n (9',ri) 

02,V 

has no explicit form. Let (82,9, ~n(9)) be the profile likelihood estimator for 
(6*2, rj) so that 9' = (9, #2,0). Both rj{6) and S n (9) can be computed efficiently 
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via the iterative algorithm in Section 4 of [34|, a special case of our general 
algorithm. Murphy and van der Vaart (1999) showed that, for any n — > 0q, 

(17) \\rj(0 n ) - m \\ BLl + ||% - 0' \\ = Op(\6 n - O | V n-3), 

where || -||sLi is the weak topology This implies that r = 1/2. The following 
Table 2 is similar as Table 1. Interestingly, we find that n converges to n 
at a faster rate in the second model. 



Table 2. Semiparametric Mixture Model in Case-Control Studies (r = 1/2) 





1/2 


^ = 1/3 


^ = 


1/4 


Mixture r\ = 


3/4 


n = 1/2, r 2 = 3/4 


ri =3/8,r 2 = 


9/16, rg = 3/4 


Models k* -- 


= 1 


A;* = 2 


A;* 


= 2 



Remark: Define ||^ fc) - 0„|| = P (n- rfc ). 

4. Semiparametric Estimation under Regularization. In this sec- 
tion, we consider the semiparametric estimation under two types of regular- 
izations, i.e., kernel estimation and penalized estimation. In contrast with 
the profile likelihood estimation, the regularized S n (0) is usually differen- 
tiable although its form may vary under different regularizations. We first 
present a unified framework for studying 0^ when S n (0) is third order dif- 
ferentiable, and then present several examples corresponding to different 
regularizations which fit into this framework. We also discuss the variable 
selection in partly linear models as an extension of the penalized estimation. 

In this section, we construct n in step (III) as follows: 

(is) o^ = o^ + [W^Y'UoL^), 

where i n {-) = S n l \-)/n and 

(19) U) = -S n 2) (-)/n, 

where Sn\-) is the j-th derivative of S n (-). When S n (0) has no explicit 
form or is hard to compute, we may prefer constructing [I n (,G)]ij as 

(20) - n -i/2 [& 1) (g + "~ 1/2 *2ttj)]< - [^V + n-^hv^i 



t 2 -h 

where t\ and ti (t\ < t^) are arbitrarily fixed real numbers 
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Recall that 

S n (9) = n¥ n loglik{9, V ,{9)) 

and define Sn (•) as the j-th derivative of S n (-). In view of the discussions 
in Section [21 i.e. ([6]) & ([8]), we expect that 9^ converges to e n if si j \-) 
approximates sji (•) well enough round 9q f° r 3 = 1, 2, 3. Therefore, we 
assume the following general condition G. 

G. Assume that 



(21) 
(22) 

(23) 



sup 



Is«(0 o )-IsU ( o) 

~ n 

l -s£Xe)- l -s^(e) 



sup 

eeM(e ) 



n 



Op(nS), 
Op(1), 



where 1/4 < g < 1/2. 



We will provide two sets of sufficient conditions for G in the kernel estima- 
tion, where the value of g is determined by the bandwidth order, and in the 
penalized estimation, where the value of g is determined by the smoothing 
parameter, respectively. In this sense, we can think g is a measure of the 
convergence rate of rj{9) as in (I10|) . We may verify (I23p by showing 



(24) 



sup 



n n 



op(1), 



and that the class of functions {(d 3 /d0 3 ) log lik(x; 6, r/*(0)) : 9 G M(9 )} is 
P-Glivenko-Cantelli and that 

sup E\(d?/d9 3 ) log lik(X; 9, rj*(9))\ < oo. 
6eN{6 ) 

Now we present our second main theorem, i.e., Theorem [3l Define 

\Ri(4>,g,k) k<Li(ip,g) 



(25) Rty,g,k) 



{R 2 {Ri (ip,g, Li(ip,g)) ,g,k - Li(^,g)) k > Li(ip,g) 



where R^gje) = {1/2 -J) + 2 k {^ + g- 1/2), L^, g) = int[log(g / (g + 
l/2))/log2], L x {^g) = int[log(g/(g + i>- l/2))/log2] and R 2 (^,g,k) = 
kg + ip. 



imsart-aos ver. 2006/01/04 file: KPMLE_vl0.tex date: September 23, 2010 



12 



GUANG CHENG 



Theorem 3. Suppose that Condition G holds, 9 n defined in (0|) is 
consistent and Iq is nonsingular. We have 



(26) 



1 n ~ ~ 



Let 9^ be the k-step estimator defined in I118\) and 9n°^ be ri^ -consistent for 
(1/2 - g) < ip < 1/2. Define - 9 n \\ = P (n-^). We show that r k 

increases from ip to oo as k — > oo. Specifically, we show 

(27) - n || = P (n~ 2 ^) »/ /„(•) is de/ined in g®, 

(28) - n || = P (n- R ^' 9 'V) if /„(•) is defined in 

This implies that 

\\&p-e n \\ = OP {n- 1 i % ), 

where k* = int[log(l/2ip)/ log 2] /or pT7|) and k* = Li(ip,g) for 



Note that (I27p is a statistical counterpart to the well known quadratic 



convergence of the Newton- Raphson algorithm; see Page 312 of |32j]. Theo 



rems [2] and [3] imply that (i) more than k* iterations, i.e., k, will not change 
the limiting distribution of 9 n , but will improve its higher order asymptotic 
efficiency; (ii) the higher order asymptotic efficiency of &n is determined by 

/ T.\ 

how accurately rj is estimated, i.e., the values of r and g; (iii) 9 n converges 
to 9 n faster when I n is constructed as an analytical derivative no matter 
whether the regularization is used or not. 

Remark 1. Given that the initial estimate is x fn consistent, we have 



Op{n 2 ) if I n {-) is constructed as in [W\) , 
P (n~ (1/2+fcs) ) if /„(•) is constructed as in f5fl)) 



based on Theorem^ This implies k* = 1. 

A by-product of Theorem [3] is the application to the parametric models, 
i.e., 7] is known. In this case, S n (9) becomes £q(X) = loglik(9; X), and we 
simplify the general Condition G to the following Conditions P1-P2. Denote 

(3) 

the first, second and third derivative of lg{-) w.r.t. 9 as ie{-), ^e(-) and £ e (•), 
respectively. The information matrix at 9q is defined as Iq. 

PI. £$(•) and £$(') are absolutely continuous in 9. 
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P2. There exists a 5 > such that, for any \t\ < 5, 

< Kj for some finite constant Kj 



(29) E $£W 5 



where i = 1,2. 

We can easily prove Corollary [T] by following similar analysis in Theorem [3] 
and considering Lemma |A,4[ Thus, its proof is skipped. 

Corollary 1. Suppose that Conditions PI & P2 hold. Also suppose 
that the parametric MLE 9 n is consistent and Jo is nonsingular. Let g = 1/2. 
Then all the conclusions for 9 n and 9n in Theorem^ hold for the parametric 
estimation. 

The above corollary generalizes the one/two-step parametric estimation 
results in 24]. Comparing Theorem [3] with Corollary [H we notice that 0^ 



converges to 9 n at a slower rate in semiparametric models. This results 
from the presence of an infinite dimensional r\ estimated at a slower-than- 
parametric rate by comparing Lemmas IA.3I and IA.41 

Remark 2. We would like to mention that the regularized S n (9) may 
not be differ entiable in some semiparametric models, e.g., the penalized esti- 
mation of partly linear models under current status data studied in F7j /. In 

such cases, we can take the discretization approach to construct 9n as in 
the profile likelihood framework, i.e., {1$ , and obtain similar results as in 
Theorem^ if we can prove that the non-smooth S n {9) share the same higher 
order quadratic expansion as \ogpl n {9) . Indeed, Cheng and Kosorok (2009) 
have proven such results for the non-smooth regularized S n (9) under weaker 
conditions. See fldl] for more elaborations. 

4.1. Kernel Estimation in Semiparametric Models. In this subsection, 
we consider the kernel estimation in semiparametric models. Due to its sim- 
ple form, the kernel estimate of n and the related iterative algorithm of 
estimating 9 are widely used in semiparametric models, e.g., [1, [13]. In par- 
ticular, the kernel approach is proven to be a powerful inferential tool for 



the class of conditionally parametric models (CPM), see 38|, [39J. Thus, in 
this subsection, we will focus on the class of CPM although our conclusions 
can be extended to more general class of semiparametric models by incor- 
porating the results in 0]. Under kernel estimation, k* is shown to depend 
on the order of bandwidth used in the kernel function. 

The class of CPM was first introduced by Severini and Wong (1992) 
and further generalized to the quasi-likelihood framework by Severini and 
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Staniswalis (1994). Specifically, we observe X = (Y,W,Z) such that the 
distribution of Y conditional on partitioned covariates W = w and Z = z 
is parameterized by a finite dimensional parameter <p = (&i A^), where X z G 
H C R depends on the value of z as a function 77(2). The joint distribution of 
(W, Z) is assumed to be independent of </>. Thus, this semiparametric model 
has the log-likelihood log lik{X; 9, rj(z)) and is called conditionally paramet- 
ric. The practical performance of the iterative estimation procedure (I)- (IV) 
for the CPM is extensively studied in (3^ . 

We assume that rj(z) € ~H = {h € C 2 (Z) : h(z) € interior(il) for all z 6 
Z}. An important feature of CPM is that its least favorable curve can be 
expressed as (see [38| for details) 

(30) = arg sup E[log lik(X; 9, rj)\Z = z], 

■n&c 2 [0,1] 

and thus its kernel estimate is written as 

' z - Zj 

r?eC2[0,l]7rT V & n 



(31) r)(fl)(z) = arg sup ^logZifcpQ; 5, ^Z^K 



where K(-) is a kernel with the bandwidth b n — > 0. For example, if (Y\w = 
W,Z = z) ~ iV(6>V 77(2)), then we have 

Although rj(0) (and thus S n {6)) solved from (I3TT) generally has no explicit 
form, based on (|3ip we can control the asymptotic behaviors of rj{6) (and 
thus S n {9)) by assuming proper kernel conditions, see the below Example 3. 

By exploiting the parametric structure of CPM, we will show S n (9) satis- 
fies the general Condition G under the below Conditions K1-K2 and C1-C2. 

Kl. For arbitrary 9\ 6 G and Ai 6 H, if 9 7^ 9%, then Eg^ log lik(X; 9, A) < 

Eg lM \oglik(X;9i,\i); 
K2. Assume that 



(33) E I sup 

I (e,x)eexH 



d r+s log lik(X; 9, X) 



< 00 



<9<9 r <9A s 

for all r, s = 0, . . . , 4 and r + s < 4. 
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Similar identifiability Condition Kl and smoothness Condition K2 are also 
used in [3^j. Our next conditions C1-C2 are concerned about the smoothness 
and convergence rate of rj*(0) and rj(9). We denote the derivative of 77* (9) 
(rf{9)) w.r.t. 9 as rji (9) (rf s \9)), and their values at 9q as 77^ (Vo^)- 



CI. Assume that, for all r,s 

Qr+s 



0, 1, 2, 3 and r + s < 3, 

Qr+s 



dz r d9 ! 



T)*(0)(z) and 



dz r dd l 



mi*) 



exist and sup egA f (0o) ||77i s) (0)lloo < 00. 
C2. Assume that 



(34) sup 1 1 77 



(35) sup 
eaN{9 ) 



\'l 



{s) (9) 



n 



is) 



(3) 



(36) 
(37) 



d ^ d 



c9 



P (n- 9 ) 
op(l), 
o P (n-' 5 ), 
o P (n- a ). 



for s = 0,l,2, 



for some g £ (1/4, 1/2] and (2g - 1/2) < S < g. 

In view of (|30p - (|3ip . we can verify C2 by applying the kernel theories 
under some proper kernel conditions and K1-K2. For example, in Lemma [21 
we show that the convergence rate of the kernel estimate in (|34p . which 
determines the value of g in (I2ip - ()22p . relies on the order of bandwidth b n 
used in (|3ip . Note that Condition C2 also satisfies (|10|) assumed for the 
NPMLE since 



< \\v(9nl-v(9o)\\oo + \\v(9 )-V*(9o)\\c 

< O P (\\e n -0o\\ vn- ff ) 



Vol 



by the construction that ?]*(#o) = VO: C1-C2 and (I34p . Our conditions K1-K2 
and C1-C2 are generally stronger than M1-M4 and ()10p since the semipara- 
metric models under consideration have the assumed parametric structure. 

Theorem 4. Assuming that Conditions K1-K2 and C1-C2 hold, then 
the Condition G required in Theorem [3 is satisfied for the kernel estimation 
in conditionally parametric models. 
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The consistency of 6 n required in Theorem [3] can be established if we 
further require the global condition sup 0g Q \\v(.@) ~ r /*(^)l|oo — > 0, see Propo- 
sition 1 of [3£|. In the next example, we apply Theorems [3] and H to a 
subclass of CPM, called conditionally exponential models (CEM), in which 
rj(0) has a closed-form. This makes the verifications of C1-C2 much easier. 
The relation between k* and the order of b n in f)31 1) is also specified in the 
below example. We may also apply our theories to the more complicated 



semiparametric transformation model, i.e., (271 ]. 
Example 3. Conditionally Exponential Models 

In CEM, there exists a function ipg(-) such that the conditional distribu- 
tion of ipo(Y, W) given Z = z does not depend on 9 and forms an exponential 
family. And its log-likelihood can be expressed as 

log lik(X; 6, rj) = ipg(Y, W)T( V (Z)) - A( V (Z)) + StygQr, W)) 

for some functions T, A and S. Some simple algebra gives that 

where 77 = p{Eg tV (vpg(Y, W))}. In the previous conditional normal model, we 
have ip e (Y, W) = (Y-0'W) 2 and p(t) = t. Another example is that (Y\W = 
w,Z = z) ~ Exp(0,exp(6' / 'u; + 7 ? (z))) in which tp e (Y,W) = Y e^p(-9'W) and 
p(t) = Iogt. 

It is easy to verify that Conditions K1-K2 are satisfied for the above two 
models if G x H is assumed to be compact. We will verify Conditions C1-C2 
by applying the following Lemma. Let ip^ (■) be (&* /ddtyipe^) and fej(-\z) 
be its conditional density. Denote f(z) as the marginal density of Z. Let M 
be a compact set so that mg(z) = E[ipg(Y,W)\Z = z] € int(M) for all z,6. 

Lemma 2. Assume the following conditions hold: 

(a) E{sup e \4 J) \} < 00 for j = 0, 1, 2, 3; 

(b) For some even integer q > 10, sup g E{\^^\ q } < 00 for j = 0, 1, 2, 3; 

(c) sup^sup^. \fffj(y,w\z)\ < 00 for j = 0, 1,2 and r = 0, . . . ,4; 

(d) su Pz |/ (r) (z)| < 00 for r = 0, ... ,4; 

(e) < inf 2 f(z) < sup 2 f(z) < 00; 

(!) sup m6M \p ij) (m)\ < 00 for j = 0, . . . , 4. 

Suppose that the kernel function K(-) in $38\) satisfies 

K(u)du = l, JuK(u)du = 0, JuiK(u)du< 00, 
sup|i^ (r) (ti)| < 00 forr = 0, . . . ,4. 

u 
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Condition CI holds under the above conditions. If we choose b n x n a for 
1/8 < a < (q — 2)/(4g + 16), then Condition C2 is satisfied with 



(39) g = 2a A 



q a(q + 4) 



2q + A q + 2 



(40) 5 = -^- Q(2g + 6) -2 £ 

K J 2q + 4 q + 2 

for any e > 0. 



The above Lemma specifies the relation between the bandwidth order a 
in the kernel estimation (|38p and k* in Theorem [3j By some algebra, we can 
verify that g S (1/4, 1/2] and (2g — 1/2) < 5 < g given the above range of 
a and q. We want to point out that the convergence rates of r}(6) (and its 
derivatives) may be improved, i.e., larger value of g, under more restrictive 



kernel conditions, see [2j, |41|. 



We next apply Theorems OH and Lemma [2] to the previous conditional 
normal (exponential) example, in which q is shown to be arbitrarily large and 
M is chosen as a sufficiently large compact subset of (0, oo). For simplicity, 
in the below table, we assume that q = 28, b n x n _1//5 , e = 1/600 such that 
g = 151/600 > 1/4 and 5 = 1/20 according to (l39j) - (|40j) . 



Table 3. Conditional Normal (Exponential) Model (g = 151/600) 





■0 = 1/2 


V> = l/3 


Construction I 


ri = 1 


n = 2/3 




k* = 1 


k* = 1 


Construction II 


r x = 451/600 


n = 251/600, n = 353/600 




k* = 1 


k* = 2 



ip = 1/4 



Construction I 


n = 1/2, ra -- 
k* = 2 


= 1 




Construction II r\ = 151/600, r^ 


= 153/600, r 3 = 


157/600, r 4 


= 165/600 


r 5 = 181/600, r 6 


= 213/600, r 7 = 


277/600, r 8 


= 405/600 




k* = 8 







Remark: tp: convergence rate of 6\ ; r^: Define \\8h — 6 n \\ — Op (n Tk ); Construc- 
tion I: /„ is constructed by ([19| ; Construction II: /„ is constructed by ([20]) . 



4.2. Penalized Estimation in Semiparametric Models. In many semipara- 
metric models involving a smooth nuisance parameter, it is often convenient 
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and beneficial to perform estimation using penalization, e.g., [28l. |4(J| . Un- 
der regularity conditions, penalized semiparametric log-likelihood estimation 
can yield fully efficient estimates for 9, see (|47j) . In penalized estimation 
framework, the value of k* is shown to relate to the order of the smoothing 
parameter X n . A surprising result we have is that k* iterations are also suf- 
ficient for recovering the estimation sparsity in high dimensional data, see 
the below partly linear example. 

In this subsection, we assume that r/ belongs to the Sobolev class of 
functions Hk = {rj : J 2 (v) = fzi 7 ! ( z )) 2 dz < oo}, where ryi' is the j- 
th derivative of ij and Z is some compact set on the real line. The penalized 
log-likelihood in this context is defined as 

(41) log lik Xn (6,r)) = n¥ n loglik{9, V ) -nX 2 n J 2 {r,), 

where X n is a smoothing parameter. We assume the following bounds for A n : 



(42) 



opin- 1 / 4 ) and A" 1 = O p ( 



n 



fc/(2fc+l)i 



In practice, A n can be obtained by cross-validation 44J ] . Here, the regularized 
S n (9) becomes the log-profile penalized likelihood S\ n (9): 

(43) S Xn (9) = log Xn (9,rl Xn (9)), 

where rj Xn {9) = argsup^-^ log lik Xn (9, rj) for any fixed 9 and X n . We define 
the penalized estimate as 9 Xn . 

The construction of the fe-step penalized estimate 9 X follows from (I18p 
just with the change of S n (-) to S Xn (-). For the penalized estimation, we 
need to slightly modify Condition G as follows: 

G'. Assume that, for some constant c, 



(44) 
(45) 

(46) 



1 



n 

sup 



si>o)-cP. 



n<-0 



s { 2(e) + d 



sup 

?eAf(0 o ) 



n 



Op(XI), 

Op(X n V 

Op(1). 



It is easy to verify Condition G' if rj Xn {9) has an explicit expression and 
log lik Xn {9, rj) is smooth w.r.t. (9, rj), see the below example 4. We also want 
to point out that Condition G' is relaxable to a large extent, see Remark [2l 
For example, rather than the explicit form of rj Xn , we may only require rj Xn 
satisfying \\r] Xn (9 n ) - rjo\\ = Op(\\9 n - 9 \\ V A„) for any consistent 9 n . 
In view of (J6j) and ([8]), we can prove Theorem [5] similarly as Theorem [3j 
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Theorem 5. Suppose Condition G' holds, the penalized MLE 0\ n is 
consistent and Iq is nonsingular. We have 

1 n ~ ~ 

(47) ^(0 Xn -0 ) = ^J2 I o 1 MXi) + Op(V^\l). 



Define g = max{g' : A n = Op(n g ')}, and thus 1/4 < g < k/(2k + 1) based 



on Condition ^42\) . Construct as in ilS\) with the change of S n (-) to 



S\ n (-). Then all the conclusions for 9^ in Theorem^ also hold for 



The above asymptotic linear expansion (|47p was also derived in [15|] but 
under very different conditions. Theorem [5] implies that k* depends on the 
order of the smoothing parameter A n , i.e., the value of g, see (|28|) . Because 
of the duality between the penalized estimation and sieve estimation, we 
expect that the above conclusions also hold for the semiparametric sieve 
estimation, see 0|. For example, when rjn is estimated in the form of B- 
spline (local polynomial) as in (0, |jj|) , k* may rely on the growth 
rate of the number of basis functions (the order of bandwidth in the kernel 
function). The detailed theoretical exploration towards this direction is not 
considered in this article due to the space limitation. 

We next apply Theorem [5] to the following partly linear models under high 
dimensional data. Surprisingly, we discover that one step iteration is suffi- 
cient for achieving the semiparametric estimation efficiency and recovering 
the estimation sparsity simultaneously. 

Example 4- Sparse and Efficient Estimation of Partial Spline Model 

The partial smoothing spline represents an important class of semipara- 
metric models under penalized estimation. In particular, we consider 

(48) Y = W'9 + r](Z) + e, 

where rj G Ti^ and < Z < 1. For simplicity, we assume that e *~ iV(0,<7 2 ) 
and is independent of (W, Z). The normality of e can be relaxed to the sub- 
exponential tail condition. In this example, we assume that some components 
of #o are exactly zero which is common for high dimensional data. It is 
well known that effective variable selection in semiparametric models could 
greatly improve their prediction accuracy and interpretability, e.g., (tI [lH]. 
To achieve the estimation efficiency and recover sparsity of 6, Cheng and 
Zhang (2010) proposed the following double penalty estimation approach 
for (|4"8|) . Specifically, they define (0\ n ,r)\ n ) as the minimizer of 

d , - , 

(49) n¥ n {Y - W'9 - V (Z)) 2 + nX 2 n J 2 { V ) + nr 2 



9jP 
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, 6d)' is the consistent 



where 7 is a fixed positive constant and 9 = (9\ 
initial estimate, over O x T~Lk- 

We will show that 9 X possesses the same semiparametric oracle prop- 
erty, whose definition is given below, as 9\ n . The standard smoothing spline 
theory suggests that 



(50) 



rj Xn (9)(z) = A(X n )(y-^ 



where r} Xri {8){z) = (rj x J9){zi), . . . ,rjxJ9)(z n ))' , y = (yi,...,y n )' and w = 
(w[, . . . ,wLY. The expression of the n x n influence matrix A(X n ) can be 
found in [21[. Therefore, rj Xn (9) is a natural spline of order (2k — 1) with 
knots on Zj's for any fixed 0. Plugging (I50p back to (|49p . we have 



(51) 



where 



3=1 



l • 7 



(52) 



Sx n (0) = (y-wO) , [I-A(X n )](y-vrt 



and / is the identity matrix of size n. When T n = 0, the minimizer of (J49J) be- 
comes the partial smoothing spline, and we denote it as (9 Xn , 7jx„)- Note that 
9\ n has a simple analytic form as 9\ n = [w'(7 — A(\ n ))w]~ l W[I — A(X n )]y. 
However, 9\ n as the minimizer of S\ n (9) does not have an explicit solution 
form, and has to be iteratively computed using software like Quadratic Pro- 
gramming or LARS 18], see Section 4 of [if]]. Specifically, based on (fT8|) - ([T9|) . 



we construct 9\ as follows: 

An 



+ 



w'(/-A(A„))w 



■;?. 



w'{I-A(X n ))(y 



n 



where 5 n (9) = (sig n (9 1 )/\9 1 \\ . . . , sign(8 d ) /\e d p)> '. 

Without loss of generality, we write 9q = (G[ > #2)') wnere 9\ consists of all 
q nonzero components and #2 consists of the rest (d — q) zero elements, and 
define 9\ n = (0' x ^ 2 Y accordingly. We assume that W has zero mean, 
strictly positive definite covariance matrix S and finite fourth moment. The 
observations z^s (real numbers) are sorted and satisfy 



(53) 



2i 



u(w)dw 



11 



for i = 1, 2, . . . , n, 
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where u(-) is a continuous and strictly positive function. The above reg 



ularity conditions are commonly used in the literature, e.g., [171 . [2l[, and 
are relaxable. For example, Condition (|53p can be weakened to the case in 
which Zj's are sufficiently close to a sequence satisfying (j53|) . For simplicity, 



we assume that 7 = 1 and 9 is -y/n-consistent. In this example, 9\ n or the 



difference based estimate 46], which are both known to be y/n consistent, 



can serve as 9 or 

In this example, we say 9\ n satisfies the semiparametric oracle property if 

01. \/n(0\ n! i — 61) — > N(0,a 2 Ti^l), where En is the q x q upper-left 
submatrix of E [Semiparametric Efficiency]; 

02. 0\ nj 2 = with probability tending to one [Sparsity]. 

It is easily shown that a 2 ^ in 01 is the semiparametric efficiency bound 
for 9\ since z is assumed to be fixed. 

Corollary 2. Ifn k ^ 2k+1 ^X n -> A > and ra fc /( 2fc+1 V n -> r > 0, 
then 9\ n is y/n- consistent and satisfies the semiparametric oracle property. 
Given that 9 X is y/n- consistent, then \\9 X — 9\ n \\ = Op(n~ ) and 9 X also 
enjoys the semiparametric oracle property. 

The above Corollary is a simple but interesting application of Theorem [5j 
We can definitely relax its conditions to the general 7 and non-^n consistent 
9 X in which we may require more than one iteration. The conditions on X n 
and T n are also chosen for simplicity of expositions and are relaxable. In 
addition, the proof of Corollary [2] implies the following special case of (j47]): 



1 n 

v^(0A„,i - 0i) = E Wuei + 0p ( 

yjn , ^ 

1=1 



n 2(2fc+l) 



where W%% is the first q elements of W{. It is also possible to extend the 
conclusions of Corollary [2] to the semiparametric quasidikelihood framework 
proposed in [28[ after more tedious algebra. 

5. Initial Estimate. In this paper, we assume the existence of a n^- 
consistent 9n just as the numerical result assumes the iterations commence 
in a suitable neighborhood of #o- Occasionally, the semiparametric model 
structure can be exploited to produce a yVl-consistent initial estimate, e.g., 



37J, |46|. However, if such ad-hoc methods are unavailable, a general strategy 
is to conduct a search of some objective function at finitely many #-value 
and call the optimizer as the initial estimate. The numerical analysis liter- 



ature suggest several search strategies for parametric models, e.g. [20l. 1431] . 
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and Robinson (1988) subsequently proved the consistency and convergence 
rate of those numerical outcome. In this section, we extend Robinson's re- 
sults to semiparametric models, i.e., Theorem[6j This extension is nontrivial 
since our objective function usually has no explicit form and is possibly non- 
smooth. In fact, our theoretical results on searching 9 n can be applied to 
any objective functions satisfying the below Conditions 11-12, and are thus 
of independent interest. 

We use the generalized profile likelihood S n {6) as our objective function 
in semiparametric models. Besides the compactness of G and consistency of 
9 n , we have two primary conditions 11-12 on S n (9) to guarantee the validity 
of the grid search methods we will consider. 

11. [Asymptotic Uniqueness] For any random sequence {9 n } £ Q, 

(54) [S n (9 n ) - S n (9 n ))/n = o P (l) implies that 9 n - 6 = o P (l). 

12. [Asymptotic Expansion] For any consistent 9 n , S n satisfies 

s n (e n ) = s n (eo) + n(e n -e yFj -^(e n -e )%(9 n -e ) 

(55) +A n (0 n ), 

where A n (0) = n\\9 - 9 \\ 3 V n 1 ^^ - O || and 1/4 < r < 1/2. 

Condition II is usually implied by the model identifiability conditions. Note 
that, in Condition 12, we only assume the existence of the asymptotic expan- 
sion (]55p but not assume the continuity of S n (-). In Section [3j we have shown 
that the log-profile likelihood log pl n (-) as a special case of S n (-) satisfies 12 
under model Assumptions M1-M4, see (IA.5j) . As for the regularized S n (-), 
we can verify 12 under Condition G using a three term Taylor expansion 
of S n . Specifically, 12 is satisfied if we assume Conditions K1-K2 &: C1-C2 
(C) for the kernel estimation (penalized estimation). In particular, we can 
change n 1_2r to n\ n in A n (-) when considering the penalized estimation. 

Now we consider two types of grid search: deterministic type and stochas- 
tic type. In the former, we form a grid of cubes with sides of length sn~^ 
over R d for some s > and < ip < 1/2, and thus obtain a set of points 
T^n = {9id} regularly spaced throughout G with cardinality card(V n ) > 
Cn d ^ for some C > 0. The grid point which maximizes S n (9) is thought 
of as 9 n . However, this deterministic search could be very slow if the di- 
mension d of 9 is high. This motivates us to propose the stochastic search 
in which the search points are the realizations of some independent random 
variable 9 with strictly positive density around 6>o, e.g., 9 ~ Unif[@}. And 
we require that the magnitude of the stochastic search points remains 
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no matter how large the dimension d is. In theory, the stochastic grid search 
has significant computational savings over the deterministic alternative. In 
the below Theorem [6] we rigorously prove that the convergence rates of the 
above numerical outcomes are rv 1 -consistent for < tp < 1/2. 

Theorem 6. LetD n be a set of points regularly spaced throughout 
with card(T> n ) > Cn d ^ for some C > and < ip < 1/2. Assume that 9 is 
independent of the data and admits a density having support and bounded 
away from zero in some neighborhood of 8q. Let S n be a set of realizations 
of with card{S n ) > Crfi for some C > and < tp < 1/2. Suppose that 
Conditions 11-12 hold, and that the parameter space is compact. Then, if 
9 n defined in (G|) is consistent and Iq is nonsingular, we have 

(56) 9%-0 = P (n~^), 

(57) 9 s n -8 Q = P (n-% 

where 8® = argmax6i e x> n S n (9) and 9^ = argmax eGlSn S n {9). 

Theorems [2][3] together with the above Theorem [6] offer rigorous statistical 
analysis for the general iterative semiparametric estimation algorithm pre- 
sented in Introduction section. Those theorems indicate a tradeoff between 
the computational cost of searching for an initial estimate, i.e. card(D n ) or 
card(S n ), and that of generating an efficient estimate, i.e., k*. Theorem[6]can 
be applied to all the examples we have considered. Specifically, Condition 



II is verified for Examples 1-2 in 14 ]. and we can easily verify Condition II 



in Example 3 by adapting the consistency proof of 9 n in [38], see its Propo- 
sition 1. In fact, the Conditions 11-12 are very mild and can be satisfied in 
a wide range of semiparametric models, e.g., proportional odds model and 
penalized semiparametric logistic regression. 

APPENDIX 

A.l. Conditions M1-M4 on the Least Favorable Submodel. The LFS in 
Section [3] is constructed in the following manner. We first assume the ex- 
istence of a smooth map from the neighborhood of 9 into %, of the form 
1 1—)- i]*(t; 9, 7]), such that the map 1 1- > £(t, 9, rj)(x) can be defined as follows: 

(A.l) £(t,9,r])(x) = log lik(t, V ,(t; 9, V ))(x), 

where we require riA9\ 8, y) = rj for all (9,rj) G x %. Thus, \ogpl n {9) = 



^ l 1 l =1 £(Xi;9,9,r}(9)). See [14| for similar constructions. We define l(t,9,rj), 
l(t,9,r]) and l( 3 \t,9,rj) as the first, second and third derivative of £(t,9,rj) 
with respect to t, respectively. Also denote ^(£,#,7/) as (d 2 /dtd9)£(t,9,r)). 
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Ml. We assume that the derivatives (d l+m /dt l d9 m )£(t, 9, 77) have integrable 
envelop functions in L\(P) for (l+m) < 3, and that the Frechet deriva- 
tives of 77 1— > £(60, 80, v) an d f] £t,e(9o, 9o,rj) are bounded around 770; 

M2. E£(e ,e ,v) =0(\\v ~ ??o|| 2 ) for all 77 around ?? ; 

M3. G n (£(e o ,e o ,v(0n))-i(0 o ,do,vo)) = o P (n- 2r+1 / 2 vn 1 / 2 - r \\9 n -9 \\) for 

~ p 

any 6> n ->• 6» ; 

M4. The classes of functions {£(t,9, 77) (x) : {t,9,rj) G U} and {£t,e{t,9,?])(x) : 
(t, 6», 77) e T/} are P-Donsker, and {£ (3) (t, 0, : (t,9,rfj G V} is P- 

Glivenko-Cantelli, where V is some neighborhood of (9o,9o,r]o). 



See Section 2.2 of 1_4| for the discussions on M1-M4. 



A. 2. Useful Lemmas. The first two Lemmas are used in the proof of 
Lemma [TJ The Lemmas IA.3[ IA.41 IA.5I and IA.6I are used in the proofs of 
Theorem El Corollary [TJ Theorem 0] and Corollary [21 respectively. 



Lemma A.l. Suppose that Conditions M1-M4 and luu\) hold. If n is 
-consistent, then we have 

(A.2? n (9 n ,s n ) = FJo + Op (n-* V \s n \ V 9r{n ^ V |Sn|) 



\ n\s n \ 

+ U n , S n ) = £ n (0n, S n ) ~ hU n 

5r(Mv||^)VnV2-2r\ 



(A.3) +0 P 

In{9mt n ) = Io 

(A.4) +O p 



n s r 



'g r (\\8n - 9n\\ V \t n \) V nt n \\9 n - 9 n \\ V n 1 / 2 - 2 ^ 



where g r {t) = nt 3 V n 1_2r i and U n = Op(n~ s ) for some s > 0. 



PROOF: Under the assumptions M1-M4 and (fTUj) . [141 ] proved the follow 



ing asymptotic expansion of log pl n (9 n ), where 9 n is consistent, 

n 

\ogpl n {9 n ) = logpln(9o) + (9n-9 o y^2 I o(X i )-^(9 n -9o)%(9 n -9o) 

i=l 

(A.5) +O P {g r (\\9 n -9 \\)), 
\ogpl n (9 n ) = log pl n (9 n ) - -n(9 n -9 n )%(9 n -9 n ) 
(A.6) +0p [g r (\\9 n -9 n \\)\J n 1 ' 2 ^) . 
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We first prove (|A.3j) . (IA.6P implies that 



log pl n (0 n + V n + s n Vi) = log pl r 



v 



{Vn + S n Vi)'l (V n + S n Vi 



+0 P (g r (\s n \ y\\V n \\) Vn 1 / 2 " 2 ''), 



log pl n (6 n + V n 



logpl n (9 n ) - -VfioV n + P (g r (\\V n \\) Vn 1 / 2 - 2 "), 



for any random vector V n = op(l) and s n —> 0. Combining the above two 
expansions and (fT2|) . we have 



«n(fn + K, Sn) i = ;rV0«i ~ VoKi + Op : : . 

2 \ n|%| y 

By taking = and C/ n , respectively, in the above equation, we have 
proved (|A.3j) . Following similar analysis in the above, (|12[) Sz (|A.5j) yield 
(|AT2|) , and ^T3J & (|A~6|) yield flX3J>. This completes the whole proof. □ 



Lemma A. 2. Suppose that Conditions M1-M4 and p0\) hold. If 

(A.7) /n(^ 1} ,tn) " Jo = Op^" 1 )), 

i/ien we /ictue ||# n — n || = 

Op (klVll^-flnllr^V 

5r(|«n| Vlft^-M) VnV2-2r\ 



(A. 



n 



for k = 1,2, 

Proof: Based on ([13]) . we have 

/ n (^- 1 ),t n )^(^ ) -^n) = [v^Jn^^U^^-^J+V^ 

(A.9) + [V^(4(% fc-1) , *n) " 4(«n, 

The second term in (|A.9p equals to 

^(KDvn 1 / 2 -^^ 



n V v n> °n ) 



Op yfn\s n \ V 
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according to (112p and (1A.6|) . The third term in (|A.9p can be written as 

V(Nn| y\\ek k - 1] -e n \\) vn^A 



Vn\s n \ 



for A; = 1, 2, ... by replacing U n with ^ — 9 n ) in (|A.3p . Combining the 
above analysis, the assumption ()A.7p and nonsingularity of Jo; we complete 
the proof of ([AH . □ 

Lemma A. 3. Suppose that Condition G holds. If 9 n is a ri^ -consistent 
estimator for < ip < 1/2, then we have 

(A.10) -I o (0 n - ) + Op((n~« vft - 9 \\)\\e n - 9 \\), 

n-\S^{6 n + U n )-S^\e n )} 
(A.ll) = -IoU n + O P ((n- 9 V\\9 n -0o\\)\\U n \\), 

where U n a statistic of the order Op{n~ s ) for some s > if). 

Proof: We first consider (|A.10p , Using a Taylor's expansion, we have 

-S^(9 n ) = ^si 1 \9 ) + -S^(9o)(9 n -9 ) 
n n n 

1 ~ *? (f)*\ ~ 

+-(0n-9 o )®^^-®(9 n -0o) 

2 n 



I5W(0 O ) + A + B, 
n 



where 9[ lies between 9 n and 9q. I n view of (JHJ) and (|22p . we have A = 
-Io(0n-0o) + Op(n- s ||0 n -0 o ||). Condition QEft implies that 5 = P {\\9 n - 
9q\\ 2 ). This completes the proof of ()A.10p . We next consider (|A.lip . Similarly, 
we have [S^ (9 n + U n ) - S^ ] {9 n )]/n 

= -S^(9 n )U n + P (\\U n \\ 2 ) 
n 

= -S^(9 n )U n + P (n- 9 \\U n \\ V ||f4|| 2 ), 
n 



-S^(9 )U n + P {\\9 n - 9 Q \\\\U n \\ V n-<>\\U n \\ V \\U n \\ 2 ), 
n 

hU n + P {n~ l l 2 \\U n \\ V \\9 n - e Q \\\\U n \\ V n- g \\U n \\ V \\U n \\ 2 ), 



where the second equation follows from ()22[) , the third equality follows from 
(|23p and the last equation follows from CLT and ([8]) . Considering that 1/4 < 
g < 1/2 and s > tp, we have proved (jA.lip . □ 
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Lemma A. 4. Let S n (9) = Y^=i^( x i)- Suppose that 9 n is a r& - 
consistent estimator for < tp < 1/2. If £q(-) satisfies PI & P2, we have 



(A.12) 



-Io(On-9o) + Op(\\e n -e \\ 2 ) 
i[sU(8 n + U n )-SU(9 n )] 



n 



(A.i3) = -i u n + Op(\\e n -e \\\\u n \\), 

where U n a statistic of the order Op(n~ s ) for any s > ip. 

Proof: We only provide the proof of (|A.13P since that of (|A.12p is com- 
pletely analogous and simpler. To show (1A.13|) . it suffices to prove that, for 
every C±, > and s > tp, 



sup 

\t\<Ci,\u\<C 2 



n 



S« (e + n-*t + n~ s u) - S« (6 + n^t) 



+ n s Iqu 



Denote Z n (t,u) = n~ 1 / 2 [S^\9 + n^t + n~ s u) - s£ ] (e + n^'t)} and 
Z®(t,u) = Z n (t,u) — EZ n (t,u). Then, it suffices to show that 

(A.14) sup \Z°(t,u)\ = Opin 1 / 2 ^- 8 ), 

\t\<Ci,\u\<C 2 

(A.15) sup \EZ n (t,u) + n 1/2 - s I u\ = Opin 1 ' 2 ^- 8 ). 

|t|<Ci,|u|<C 2 

The proofs of (IA.14P and (|A.15|) are similar as those of (2.3) and (2.4) in 
Page 1224 of [24|, and are thus skipped. □ 

Lemma A. 5. Suppose Conditions K1-K2 & C1-C2 hold. Then we have 



1 n ( d \ 

* i=l 



(A.17) 
(A.18) 



n 



fn(0 O ) 



P (n~ s ), 

P (n~ s ), 
Opin 1 ' 2 - 2 ^ 



where r n (8) = S n (8) - S n (6) - Y%=i A 6,ri*(9)[v(6) ~ V* 
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Proof: The proof of Lemma 2 in [38[] directly implies (|A.16|) and (1A.17|) . 
As for (|A. 18|) . by Taylor expansion, we first rewrite 



r„(0) 



E 

i=l 



d~ log lik 

d\ 2 



{Xi;e, m (0)(Zi))dt{r}(e)(Zi) - v*(0)(Zi)}' 



1 71 

5 E Qo(x%){W)( z i) - v*(Q)(Zi)¥ 



i=l 



where r) t {d){Zi) = i7*(0)(Zi) + *(»ft0)-»7*(0))(2i). To prove (fA"T8|) . it suffices 
to show that 



(A. 19) sup 



2=1 



P (1) for j = 0,1 



in view of (|34p . For j =0, we have 

<9 2 log lik 



\Qe (x)\ < sup 



3A 2 



(x;6» , A) 



Op(l) for all z E Z 



based on the smoothness Condition K2. The case j = 1 can be established 
similarly. □ 

Lemma A.6. Let 770 (z) = (770(^1), • • • , %(^))' and e = (ei, . . . , e n )' . If 
X n — > 0, i/ien toe /lave 

(A.20) w',4(A n )e = Op(A-V( 2 *)), 

(A.21) w'[I - A(X n )] Vo (z) = P (n^ 2 X n ), 

(A.22) w'(J - A(X n ))w/n = S + Op^" 1 / 2 V n" 1 ^ 1 /*). 



Prof: We first state the Lemmas 4.1 and 4.3 in 171 ]: 

n 

(A.23) n- 1 - A(A n ))r ?0 (z)] 2 < A 2 J 2 (r/ ), 



(A.24) 
(A.25) 



z=i 

ir(A(A n )) = OCA" 1 '*), 
tr(A 2 (A n )) = OiX- 1 '*). 



Since ^ar[(w'i(A n )e)i] = <T 2 Sj i tr(^4 2 (A n )), we can show that [w'j4(A n )e]j — 
P (Xn l/2k ) based on (TQ5l) . thus proved (rA~20jl . We next consider (jAT2il 
by establishing that Var[W{I - A(X n )}r) (z)]i = 'E ii r]' (z)[I - A(X n )] 2 r] (z). 
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Then, we can prove (1X21) by fA23l) . As for fA22l) . we first write fA22j) as 
the sum of 

£ + (w'w/n — S) — w'A(A n )w/?i, 

where the second term is Op(n -1 / 2 ) based on the central limit theorem. For 
the last term, we have E{[w' A(X n )w]ij} 2 = 

(£jj) 2 (£r(A(A n ))) 2 + (£«£# + (Sy) 2 )ir(A 2 (A n )) 
+(£'(XijXij) 2 — 2(Sjj) 2 — SjjSjj) ^ A 2 r (A n ) 

r 

for i 7^ j. When i = j, we have E\ (w'^4(A | = Siitr(A(A n )). By consid- 
ering (lA34l) - (lAT25|) . we have proved (LA22jl . □ 



A. 3. Proof of Lemma{J\ By (|A.4p in Lemma lA.ll and (|A.8P in Lemma [A.21 

n ||) Vn^H^-^IIVn 1 ^ 



- — Y i- \ 

we obtain that (# n — 9 n ) 




n -3r fe _! y n -2r-r fc _! v n -l/2-2r 
, r 3rn VT j-2r^_ lvr l/2-2r > 

xn"^- 1 V ^ V | a J* 1} | V n" 2r 



Op (f^M'-^Vh^^-^Vn 



2r 



To analyze the above order, we have to consider three different stages: (i) 
r k~i < r ; (h) r < r fc-i < 1/2; (hi) > 1/2. For the stage (i), the small- 
est order of fk-i, i-e., n~ Srk ~ 1 ^ 2 , is achieved by taking |i„ ^| x n~ Vk - 1 ^ 2 , 
and the smallest order of hk-i, i-e., ra~ 3rfe - 1 / 2 , is achieved by taking |s n | x 
n~ 3rfe ~ 1 / 2 . For the stage (ii), the smallest order of fk-i, i-e., n~ 3rfe ~ 1 / 2 , is 
achieved by taking |i n | x n r "fc-i/ z ) an d the smallest order of /ifc-i 5 i-e-, 
n -(2r+r fc _x)/2^ - g ac hi evec i Dy taking |si fc_1 ^| x n~( 2r+rfe - 1 )/ 2 . For the last 
stage (iii), the smallest order of fk-i, i-e., n _3r ' fe - 1 / 2 , is achieved by taking 
| in | X n _rfc - 1 / 2 , and the smallest order of h^-i, i.e., n -1 " -1 / 4 , is achieved 
by taking \sn ^| X n" 7 *" 1 / 4 . This completes the whole proof. □ 
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A. 4. Proof of Theorem UJ According to the proof in Lemma we also 
need to consider the stochastic order of \\9 n — 9 n \\ in terms of three stages: 
(i) r^-i < r; (ii) r < r^^\ < 1/2; (hi) r^-i > 1/2. In stage (i), we have 
\\W - n \\ = OpiWetf-^ -Jnfl 2 ) = P {n- s ^ k )) if k < Kxty,r). In 
stage (ii), we have — 9 n \\ = Op{\\$h ^ — 9 n \\ 1 ^ 2 ?i~ r ), which implies 
that 0a ] - 6 n \\ = P {n- S2 ^^) if r < tp < 1/2. It is easy to show that 
S2(if},r,k) > 1/2 if k > K2(ip,r,l/2). In the last stage (hi), we obtain the 
the smallest order of ||#n — 9 n \\, i.e., Op(n _r_1//4 ). Combining the above 
analysis of (i)-(iii), we can conclude that the stochastic order of \\9$ — 9 n \\ 
is continuously improving till the optimal bound Op(n _r_1 / 4 ) and can be 
expressed as Op(n~ s ^' r ' k ^). (|16p also follows from the above analysis. □ 

A. 5. Proof of Theorem^ We first show (|26h by applying Lemma IA.3I 
In (|A.10|) . we replace 9 n by 9 n . Since 9 n is assumed to be consistent and 9q 
is an interior point of 0, we have Sn\9 n ) = 0. By ([6]) and (]2ip . we have 

(A.26) V^(6 n - 9 ) = v^/^ttVo + P (n 1/2 - 29 V n 1/2 \\9 n - 9 



2, 



given that 9 n is consistent and Iq is nonsingular. Considering the range of g, 
we can show 9 n is actually -^/n-consistent, and thus simplify (|A.26|) to (|26p. 
We next show ([27]). By {JT8]) , we can write y /nI n (d^ ) )(eY ) - 9 n ) as 

V^U^W^ ~ On) + n 1/2 (4(^ 0) ) - £ n (9n)) 

= V^U0 n O) )(0 n O) - On) + n-^ 2 S^{9^){9^ - 9 n ) + P (^\\9 n - 9^\\ 
= O P (M\0n-0^\\ 2 ) 

under Condition G. Further, by (|22p and ()23|) . we have the invertibility of 
/ n (^ 0) ) based on that of I . This implies 6$ - 9 n = Op(|ft 0) - 9 n \\ 2 ). By 
the induction principal, we can thus show 

(A.27) 0« -9 n = P (\\9tf-V - 9 n \\ 2 ) for any k > 1. 

([27]) follows from (TA37]) trivially. 

To show ([2"5]) . we first prove — # n || = 

1/2— gilSXfe— 1) _ a i|2 v/ „-9im(fe-l) _ a 



(A.28) Op (^n 1 ^" 3 !!^-^ - 9 n f V n 

By replacing 9 n and ?7 n with # n and 0h ^ — 9 n ) in (jA.lip . respectively, we 
establish that n" 1 / 2 ^ {9n k ~ 1] ) - S$p(0 n )] = 



(A.29) - V^h(0^- l) ~ On) + CMn 1 / 2 "!^- 1 ) - 
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Similarly, by setting 9 n as 6 n , and then setting U n as (9n k ^ — 6 n +n~ l / 2 tiVj) 
and (dli ^ — 6 n + n~ 1 / 2 t2Vj) in (jA.lip . respectively, we have that 

(A.30) [tityf-^h = [Ioh + Op(n l ' 2 -3 H^- 1 ) - 9 n \\ V 

when ^ is defined in (|20p . Following similar logic in analyzing ()A.27p . 
we can obtain §AJM by considering ([AT2T)1) - (lAT30jl . Next we will show that 
()A.28|) implies (|28p by the following analysis. Based on ()A.28P we have 

(A.31) 

'Op(n-s|ft* _1) - 9 n \\) if - e n \\ = P {n- 1 ' 2 ), 

It is easy to show that - n \\ = P (n" 1 / 2 ) and - 

# n || _1 = Op(n 1 / 2 ). In other words, if k < Li(ip, g), then we have the relation 
that ||^ fc) - n || = Op^ 1 / 2 ^ - n || 2 ) based on ([Oil . This implies 
the form of Ri(ip,g, k) in (|25p . Note that R\{ip,g, k) is an increasing function 
of k under the condition that r ip + g > 1/2. After L\{ij),g) iterations, we have 

(A.32) \\6^>9)) - 9 n \\ = P {n- Rl ^' 9 ' Ll ^' 9 ») = P (n- 1/2 ). 

Thus, we have the relation that — 9 n \\ = Op{n~ 9 \\9n k ^ — n \\) for 

k > (Li(ifj,g) + 1) based on ()A.3ip . Combining this relation with (]A.32p . we 
can show the form of B,2(ip,g,k) when k > L\{ip,g). Since R(ijj,g,k) is an 
increasing function of k given that 1/2 — g < tp < 1/2, the stochastic order 

of On — n \\ is continuously decreasing as k — > oo. The calculation of k* 
also follows from the above analysis. □ 

A. 6. Proof of Theorem^ We first consider (|2ip by rewriting its LHS as 

1 d 

-m Wo 
n ov 

_i=l 

where r n {9) is defined in Lemma lA.51 Therefore, we have 
n- 1 [S^(9o)-S^(0 o )} 

i=i ^ i=i 
P (n- 29 ) 



J2A 9jV4e) [r)(9)-V*m(Xi)+r n (9) 



imsart-aos ver. 2006/01/04 file: 



KPMLE_vl0.tex date: September 23, 2010 



32 



GUANG CHENG 



by Lemma IA.5I and the condition that 5 > (2g — 1/2). As discussed previ- 
ously, we will show (|22p together with (|24p . By Taylor expansion, we have 



n r \ 



s„m - s„(e) 



£ 

t=i 



d log lik 
dX 



{XiiOMOKZiVdtlvWiZj-ri^eXZi)] 



i=l 



where rj t {6) = r?*(0) + t(rf(9) — r]*(0)). Hence, to prove (|2"2"j) and 
suffices to show that 



it 



(A. 33) sup sup 



i=l 



P (1) for j = 0,1,2,3 



in view of (|34p and (J35J). Considering the smoothness Condition K2, we can 
prove (|A.33j) using the same approach as in the proof of fAlfll) . 
In the end, it remains to show that the class of functions 

{(d 3 /de 3 ) log uk(x; e, n*(o)) ■. e e M(e )} 

is P-Glivenko-Cantelli and that 

(A.34) sup E\(dP/d0 3 )loglik(X;0,r]*(e))\ < oo. 

Let ^ 3 )(0,r/(0)) = {d 3 /d6 3 ) log Zife(s; 0, 77,(0)). For any 0i,0 2 G AA(0 o ), we 
have 1^(0!, 7/*(0i))-^ 3 )(0 2 ,??*(02))| 



< sup 
e,x 



< sup 

0,A 



0^(3) 



90 

^(3) 



90 



(0,A) 
(0,A) 



'1 - 02 || + SUp 

6>,A 



||01 - 2 || +SUp 

e,x 



dX 



dX 



(0,A) 
(0,A) 



\r]*(0i) - ry* (0 2 



II (!) 

sup 1 1 77* 

9eA/"(0o) 



X ||0i-0 2 1 

< A||0i-0 2 | 



By Condition K2 and supg^^ ||?7* (0)||<x> < 00 in Condition CI, we know 
that EA 2 < 00. Thus, by the P-G-C preservation Theorem 9.23 of j26| and 
compactness of AA(0 o ), we know that 

{(d 3 /d9 3 ) log lik(x; 0, r?*(0)) : G AA(0 o )} 

is P-Glivenko-Cantelli. The last condition ()A.34p follows from the Conditions 
K2 and CI by some algebra. □ 
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A. 7. Proof of Lemma\M Let 

Y^=iMYi,W t )K((z - Zi)/b n ) 



fhg(z) 



Note that rf(9)(z) = p{fhe{z)) b y (1381) . Correspondingly, we have y*(9)(z) = 
p(mg(z)) based on Lemma 7 of [38(. Following the proof of Lemma 8 in [381 ] . 



we can derive that 



(A.35) 



dz k d0i 



jfhe(z) 



Qk+O 

dz k d6i 



sup 

eee 

/ , -fc-2±* 

Op(n-^b n q+2 n e Vb 2 n 



jmg(z) 



for any e > 0, k = 0, 1 and j = 0,1,2,3. Considering ([38]), (TOH]) and 
Condition (f), we can show that 



(A.36) sup \\rf s) (9)-r)i s \9)\\ oc = P [n~*+*bn 9+2 n e V b: 
eeM(e ) 

for s = 0, 1, 2, 3 after some algebra. Following similarly logic, we show that 



a q+A 

9+2 w .2 



(A.37) 
(A.38) 



9 ~ ( \ 9 t ^ 



q _29±6 

P I n~^b n 9+2 n e Vbl 



2q + 6 

9+2 n e V bl 



Op ( n 2q+4b r 



Considering (|A.36P - (|A.38p . we complete the whole proof. □ 

A. 8. Proof of Corollary [H For the y/n consistency of 9\ n , it suffices to 
show that, for any given e > 0, there exists a large constant M such that 



(A.39) 



P{ inf A n (s) >0J> > 1-e, 

||=M 



where A n (s) = [S\ n (9 + n~ 1 / 2 s) - S\ n (0 o )]. According to ([EE]), we have 



y 0j 



n 



-1/2, 



A n (s) > S Xn (9 + n~ l ' 2 S ) - S Xn (9 ) + nr n 2 £ 

where Sj is the j-th element of s. The Taylor expansion further gives 
A n ( S ) > n-y 2 S 'S^(9 )+ 1 - S '[S^(9 )/n] S 

1 in , —1/2, 



Ay I 



(A.40) 
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where (Oq) represents the j-th derivative of S\ n (9) at 9q. Based on (J52j) . 
we have 

(A.41) S^(9 ) = -2w'[/-A(A n )](y-w0 o ), 

(A.42) S®(p ) = 2w'[I - A(X n )]w. 

Lemma lA.61 implies that 

(A.43) S?> ) = P {nV% 

(A.44) S?>o) = P (n) 

since X n is required to converge to zero. Hence, we know the first two terms in 
the right hand side of (|A.40|) have the same order, i.e. Op(l). And the second 
term, which converges to some positive constant, dominates the first one by 
choosing sufficiently large M. The third term is bounded by n^V^Mo for 
some positive constant Mq since (5j is the consistent estimate for the nonzero 
coefficient. Considering that \fnT^ — > 0, we have shown the -y/n-consistency 
of# A . 

To complete the proof of other parts, we first need to show 
(A-45) 02 l -K\\=Op{n- 1 ) 

based on Theorem [5j And then we will verify Condition C for the case 
c = —2. It is easy to show that P n £o = w'e/n and Iq = £ in this example. 
To verify (j44]), we have 

-S^(9 ) + 2P n I 
n " 

= Jl w '(j _ A(X n )) m (z) + -w'A{X n )e + T 2 J n (9 Q ) 
n n 

where the second equality follows from Lemma lA.6l and the fact that 5 n (9o) = 
Op(l). Considering the conditions on r n and A n , we have proved (|44p . ()45|) 
follows from (|A.42j) and (jA.22j) . and (I46p trivially holds. Having shown the 
consistency of 9\ n and verified C, we are able to show (|A.45j) . 

For any sequence of estimate 9 n , the below arguments show that 9 n = 
with probability tending to one if it is -^/n-consistent. For any -y/n-consistent 
estimator, it suffices to show that 

(A.46) S Xn {(9 u 0)}= _ min S An {(M~ 2 )} 

\\e 2 \\<Cn- 1 / 2 
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for any 9\ satisfying \\9\ — 9\\\ = Op{n~ l l 2 ) with probability approaching 
to 1. In order to show (|A.46|) . we need to show that dS\ n (9)/d9j < for 
9j G (-Cn~^ 2 ,0) and dS Xn { 6 )/d0j > for 9j G (0,Cn~^ 2 ) holds when 
j = q+1, ... ,d with probability tending to 1. By two term Taylor expansion 
of S\ n (9) at 9q, dS\ n (9)/d9j can be expressed in the following form: 

dSxAO) ggUgo) , ^ d 2 S Xn (9 ) , Q _ 2 lxsign(9 j ) 

~^9~ = -^9— + ^ deM {9k ~ 9ok) + nTn — \f\ — ' 

3 3 k=1 3 K 

for j = q + 1, . . . , d. Note that \\9 — 9q\\ = Op{n~ l l 2 ) by the above construc- 
tion. Hence, we have 

%W= 0p („V* ) + sia „ ft) !^ 

09j \0j\ 

by (|A.43[) and (|A.44|) . We assume that_n fc /( 2fc+1 V„, — > r > which implies 
that y/nT 2 /\9j \ — > 00 for y/n consistent 9j and j = q+1, ■ ■ ■ , d. Thus, we show 
that the sign of 9j determines that of dS\ n (9)/d9j. The above arguments 
apply to #a„,2 and 9 X 2 since both of them are proven to be yfn consistent 
in view of the previous discussions, i.e., ()A.45|h 

Now it remains to show the semiparametric efficiency of 9\ n ^\, which 
immediately implies that of 9^ 1 based on (1A.45|) . Since we have shown 
#A n ,2 = 0, we can establish that 

(A.47) ^%Pl M * w ,o)=0 for aiiyj = !,...,* 

with probability tending to one. Let wi denote the first q columns of w. 
Applying Taylor expansion to (|A.47p around 9q, we obtain 

M0x n ,i~0i) = v^(-wi[/-yl(A n )]wil -^[[1 - A(X n )] fao(z) + e) 

[ n J n 

+0 P (v^r 2 ) 

+0 ? (^Vn- 1 /V /(a) VA n ) 
1 ™ 

V n , 

v 1=1 

based on (|A.4ip k, (|A.42p . This completes the whole proof. □ 



Wi e 

n 
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A.9. Proof of Theorem^ Define N n = {6 : \\9 - O || < Mn"*} and Af£ 
as its complement for any < M < oo. Note that T> n n An 7^ for large 
enough M and D n PlA^ 7^ for large enough n. We first consider ()56p . For 
sufficiently large M and any C\ > 0, we have 

P {6% GAC) = P G A/£ and 6 iD G A" n for some i) 



< P max <S n (0) < max S n (8) 

\v n nAf n T> n rW° 

< P ( max S n (9) < S n (8 ) - Cm^A 
+P\\ max SJO) < max SJ9) 



n ( max S n (6) > S n (8 ) - Cm l ~ 2 A) 



< P max n- 1 /2(s n (^) _ s n (6 )) < -C x n x l 2 ~ 2 ^ 

n {6° is consistent}) 
+P fmaxn- 1 / 2 (5 n (0) - S n (0 o )) > -Cm 1 ' 2 ^ 

+P (0° is inconsistent) 

< I + 11 + III, 

where 0° = arg maxx.„njV„ <S„(0). 

The definition of N n implies III — > for any M as n — > 00. We next 
analyze the term I as follows. In view of (|55p and the definition of A/" n , we 
have that 

/ = p^e:-e yFj -^(e:-eoyio(e:-e ) + n- 1 / 2 A n (e° n ) 

< _ Cin l/2-2^^ 

< P (ll^pJolHIC - Oo\\ + (S max V^/2)\\e° n - 8o\\ 2 + ||n- 1 / 2 A„(C)|| 

> Cm 1 / 2 ' 2 ^) 

< P (||v^Pn4|| > 9±^EMlll n V*-i> + 0p {n^)\ 

< I 

where S max is the largest eigenvalue of Iq, and the second inequality follows 
from the definitions of M n and A n , and the range that 2r>l/2>-0>O. 
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Denote 9 n = argmax^ S n (0). We will show II — > by first decomposing it 
as II\ + H2 ) where 

Ih = p(n- l / 2 {S n {6* n ) - 5 n (0 o )) > -Cm 1/2 - 2 ^n{^ is consistent}) , 
II 2 = P^(S n {9 n ) - 5„(0 O )) > -Cm 1-2 ^n {0* is inconsistent}) . 

Note that we can write n~ 1//2 A n (6>*) as i/™ll#n _ #o|| 2 ein + V^ll^n ~ #o|| e 2n, 
where ei n = op(l) and €2 n = op("- _1 ^ 2 ) 5 in the event that {9 n is consistent}. 
Thus, according to (|55|) . we can write iTi as 

9 )%(0*-0 ) 



P((0* n - e yy/nP n £o + V^\K - eo\\e 2n > ^(0* 



.^ l \\ei-e Q \\\ ln -c 1 n 1 i 2 -^) 



< P(\\e*-e \\ WyftteJoW + y/ne*, 



> 



n , 



r n -e \\ 2 6 n 



Cm 1 / 2 -*' 



||v^Vo|| + V^2n > V^IIC " \\(S min /2 - e ln 

2 



K 



\\y/nP n £ \\ + Vne 2n 



> SminK 12 Cl n l l 2 ^ + o P {n l l 2 ^- 
K 



where 5 m i n > is the smalest eigenvalue of Jo- All the above inequalities fol- 
low from the fact that — 6q\\ > Kn~^ for some K > M and t\ n = op(l). 
The term II2 is shown to converge to zero by the following contradiction 
arguments. By assuming that the event {(S n (6^) — S n (6o)) > — Cin l ~ 2 ^} 
holds, we have |S n (0 - S n (9 n )\ = S n (9 n ) - S„(0°) < Sn(9 n ) - S n (9 ) + 
C\n l ~ 2 ^ '. Note that (|55l) and the consistency of 9 n implies S n (9o) — S n (9 n ) = 
op(n). Then, we can show that \S n (9^) — S n {9 n )\/n = op(l) which implies 
that 9® is consistent by (|54h . This implication contradicts with another 
event in II2, i.e., {9^ is inconsistent}. Therefore we can claim that II2 — > 0. 

In view of the above discussions, it remains to show that I and II\ con- 
verge to zero. Note that ||^/nP n ^o|| m I 1S Op(l), and so is (||\/nP n ^o|| + 
\fnE2n) in II\. Therefore, by choosing sufficiently large C\ and K > M, 
meanwhile keeping the inequality 5 max M 2 < 2C± < 5 m i n K 2 valid, we show 
that I and II\ can be arbitrarily close to zero. For example, we can take 
K = M + B and d = (5 max M 2 + 5 min (M + B) 2 )/A for some fixed B > 
and sufficiently large M. This completes the proof of ([561) . 
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Our proof of (|57|) is similar as that of (I56j) . Denote Ois as an element in 
S n . Similarly, we have 

P{6 s n G Nn) < E {P (flf G Mn and % G 7V n for some i|«S n ) } 
+E{P(^ S G AA„ C for alH|«S n )} 

< P ( max 5 n (0) < max SJO) | + P (0 iS G A/l c for all i) 

< P f max n-V 2 @ n (0) - S n (0 )) < -C 2 nV 2 ~ 2 A 

+P ( max n" 1 / 2 ^^) - S„(0 O )) > -C 2 n 1/2 - 2 ^ 
+P (0 <s G jV° for all i) 

< P ( max n-^ 2 (S n (B) - S n (0 )) < -^n 1 ' 2 ^ 

\S n nj\ n 

H{#^ is consistent}^ 

+P (maxn-WtfSnie) - S n (0 )) > -CW 72-2 ^ 
+P(0l is inconsistent) + P (0 iS G A/j£ for all i) 

< i' + ii' + in 1 + 7V', 

where C 2 is an arbitrary positive constant and n = argmax^nA^ S n (0). 

We first consider the terms III' Sz IV'. Since n G M n , we have III' — > 
for any M as n — > 00. The term 7V' is computed as 

(A.48) (l-P{0eM n )) caTd{Sn) . 

Since the density of is assumed to be bounded away from zero around #0 
and card(S n ) > Cn^, (|A.48P is bounded above by 

/ , \card(S„) / ~ \card(S n ) 

M - fm-*M) < f 1 - P MC/card{S n )\ 

(A.49) — > exp(-pMC), 

for some /? > 0. 

We next consider 7'. According to (|55|) . we can show 

n- 1/2 (&(0t) _ 5 n (0 o )) 

> max (-^(0 - )'T (0 -0 )\- max {-^ - Q )'¥ n £ - A n (0)/^i} 

> - min {^(0 - Oo)'T (0 - 6 )\ - max {-^{0 - o )'pJo - A n (0)/^i}- 
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Therefore, we can bound I' by I[ + I 2 , where 
A = P ( max {-V^(9 ~ e )'FJ - n'^A^e)} > (C 2 /2)n l l 2 - 2 A 

i' 2 = p( mm {^(e - e )%(e - e )} > c 2 n l i 2 ~ 2 A . 

Given sufficiently large C 2 /M, I[ can be arbitrarily close to zero since 
I[ < P (lIv^IP Joll > + P {n^ 2 - 2 ^ V n^) 



(A.50) < P [\\y/nP n £o\\ > ^ 



2M 



+ o P (n 1 / 2 -^) 



where the last inequality follows from the assumption that 2r > 1/2 > ifj. 
Since min A ^{ x /^(0 - 9 Q )%(9 - O )} > C 2 n l / 2 ~ 2 ^ by choosing S min M 2 > C 2 , 
I' 2 is bounded above by 



P ( min{^(0 - o )'io(0 - #o)} > C^ 1 / 2 " 2 ^ 



< 

< 

< 
< 



card(S n ) 



(A.51) 



P(V^(9 - 9 )%(9 - Q ) > C 2 n 1 ' 2 ^) 

\-p{\\e-e4<{c 2 /5 max ) l i 2 n-^) 

l-P (\\9-0o\\ < {C 2 /5 max ) l l 2 C/card{S n ) 
(1 - pC{C 2 /5 max ) l l 2 /card(S n )) card ^ 

max ! 



card(S„) 



for some p > 0. In the above, the third and fourth inequality follows from 
the assumptions that card(S n ) > Cvfi" and the density for 9 is bounded 
away from zero around 9q, respectively. By assuming that 2C 2 < K 2 5 m i n for 
some K > M, we can prove that II' — > in the same manner as we show 
i7 0. 

Let L = mm{K 2 /2, M 2 }. In view of (|A.49j) , (IA.50P , (1A.51|) and the above 
discussions on II', by choosing sufficiently large C 2 , K > M and C 2 /M, 
meanwhile keeping the inequality C 2 < L5 m i n valid, we can make P(9% € 
j\f£) arbitrarily small. For example, we can take C 2 = M 3 / 2 5 min and K = 
M + B, for some fixed B > and sufficiently large M. This completes the 
whole proof. □ 
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